ssim 0
0b8aff0438617c055eb55f0ba5d226fa-Supplemental.pdf
Inthis supplemental material, wefirst present thedetailed networkarchitecture andparameters of the proposed approach in Sec. A. We further provide more analysis of the proposed method and ablation studies in Sec. B. Section C shows some qualitative results for potential applications of the proposed approach on medical imaging and imaging in astronomy. Figure 6: Illustration of learned deep features.(a) The blurry input and ground truth are shown in Figure 1(a)-(b). However, on may actually wonder whether the feature extraction network acts as a denoiser, leading to the observed robustness of the proposed method to various noise levels.
Can Simple Averaging Defeat Modern Watermarks? Pei Y ang
For some algorithms like Tree-Ring watermarks, the extracted pattern can also forge convincing watermarks on clean images. Our quantitative and qualitative evaluations across twelve watermarking methods highlight the threat posed by steganalysis to content-agnostic watermarks and the importance of designing watermarking techniques resilient to such analytical attacks.
Variational Denoising Network: Toward Blind Noise Modeling and Removal
Zongsheng Yue, Hongwei Yong, Qian Zhao, Deyu Meng, Lei Zhang
On one hand, as other data-driven deep learning methods, our method, namely variational denoising network (VDN), can perform denoising efficiently due to its explicit form of posterior expression. On the other hand, VDN inherits the advantages of traditional model-driven approaches, especially the good generalization capability of generative models.
TrashorTreasure?AnInteractiveDual-Stream StrategyforSingleImageReflectionSeparation
Existing deep learning based solutions typically restore the target layers individually, or with some concerns at the end of the output, barely taking into account the interaction across thetwostreams/branches. Inorder toutilize information more efficiently, this work presents a general yet simple interactive strategy, namely your trash is my treasure(YTMT), for constructing dual-stream decomposition networks.
DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management
Feldmann, Casimir, Wilder-Smith, Maximum, Patil, Vaishakh, Oechsle, Michael, Niemeyer, Michael, Tateno, Keisuke, Hutter, Marco
Abstract--Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive results for novel view synthesis with real-time rendering capabilities. However, integrating 3DGS with SLAM systems faces a fundamental scalability limitation: methods are constrained by GPU memory capacity, restricting reconstruction to small-scale environments. We present DiskChunGS, a scalable 3DGS SLAM system that overcomes this bottleneck through an out-of-core approach that partitions scenes into spatial chunks and maintains only active regions in GPU memory while storing inactive areas on disk. Our architecture integrates seamlessly with existing SLAM frameworks for pose estimation and loop closure, enabling globally consistent reconstruction at scale. Our method uniquely completes all 11 KITTI sequences without memory failures while achieving superior visual quality, demonstrating that algorithmic innovation can overcome the memory constraints that have limited previous 3DGS SLAM methods. ECENT advances in neural representations for 3D scene reconstruction have revolutionized novel view synthesis, with 3D Gaussian Splatting (3DGS) [1] emerging as an exceptionally efficient and high-quality approach. Unlike volume-based methods [2]-[4] that struggle with rendering speed due to expensive ray marching, 3DGS provides real-time rendering capabilities while maintaining impressive visual fidelity.
Digital Elevation Model Estimation from RGB Satellite Imagery using Generative Deep Learning
Madani, Alif Ilham, Kuswati, Riska A., Lechner, Alex M., Saputra, Muhamad Risqi U.
Digital Elevation Models (DEMs) are vital datasets for geospatial applications such as hydrological modeling and environmental monitoring. However, conventional methods to generate DEM, such as using LiDAR and photogrammetry, require specific types of data that are often inaccessible in resource-constrained settings. To alleviate this problem, this study proposes an approach to generate DEM from freely available RGB satellite imagery using generative deep learning, particularly based on a conditional Generative Adversarial Network (GAN). We first developed a global dataset consisting of 12K RGB-DEM pairs using Landsat satellite imagery and NASA's SRTM digital elevation data, both from the year 2000. A unique preprocessing pipeline was implemented to select high-quality, cloud-free regions and aggregate normalized RGB composites from Landsat imagery. Additionally, the model was trained in a two-stage process, where it was first trained on the complete dataset and then fine-tuned on high-quality samples filtered by Structural Similarity Index Measure (SSIM) values to improve performance on challenging terrains. The results demonstrate promising performance in mountainous regions, achieving an overall mean root-mean-square error (RMSE) of 0.4671 and a mean SSIM score of 0.2065 (scale -1 to 1), while highlighting limitations in lowland and residential areas. This study underscores the importance of meticulous preprocessing and iterative refinement in generative modeling for DEM generation, offering a cost-effective and adaptive alternative to conventional methods while emphasizing the challenge of generalization across diverse terrains worldwide.
Can Simple Averaging Defeat Modern Watermarks? Pei Y ang
For some algorithms like Tree-Ring watermarks, the extracted pattern can also forge convincing watermarks on clean images. Our quantitative and qualitative evaluations across twelve watermarking methods highlight the threat posed by steganalysis to content-agnostic watermarks and the importance of designing watermarking techniques resilient to such analytical attacks.
MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping
Cao, Zhihao, Wu, Hanyu, Tang, Li Wa, Luo, Zizhou, Zhu, Zihan, Zhang, Wei, Pollefeys, Marc, Oswald, Martin R.
Figure 1: MCGS-SLAM synchronizes RGB inputs from the front, left, and right cameras of the multi-camera rig in the Waymo dataset and fuses them into a unified 3D Gaussian Splatting map. The system performs real-time tracking and mapping, enabling high-fidelity reconstruction of both color and depth views from each individual camera. Through joint multi-camera optimization, MCGS-SLAM ensures accurate pose and geometry alignment, while supporting comprehensive multi-view rendering for photorealistic visualization. Abstract-- Recent progress in dense SLAM has primarily targeted monocular setups, often at the expense of robustness and geometric coverage. We present MCGS-SLAM, the first purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting (3DGS). A multi-camera bundle adjustment (MCBA) jointly refines poses and depths via dense photometric and geometric residuals, while a scale consistency module enforces metric alignment across views using low-rank priors. The system supports RGB input and maintains real-time performance at large scale.
OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics
Deng, Yinan, Yue, Yufeng, Dou, Jianyu, Zhao, Jingyu, Wang, Jiahui, Tang, Yujie, Yang, Yi, Fu, Mengyin
Figure 1: We introduce OmniMap, a general online mapping framework integrating optics, geometry, and semantics. OmniMap incrementally maintains an open-vocabulary instance-level voxel representation and a 3DGS (3D Gaussian Splatting) representation, from which color and geometric meshes are derived. OmniMap supports multi-modal rendering (RGB / depth / normal / instance), and achieves state-of-the-art performance in rendering fidelity, mesh quality, and semantic understanding. This holistic framework enables versatile support for a wide range of downstream applications. Abstract--Robotic systems demand accurate and comprehensive 3D environment perception, requiring simultaneous capture of photo-realistic appearance (optical), precise layout shape (geometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irregularities, and semantic ambiguities. T o address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real-time performance and model compactness. This work is supported by the National Natural Science Foundation of China under Grant 92370203, 62473050, 62233002, Beijing Natural Science Foundation Undergraduate Research Program QY24180. Mengyin Fu is with the School of Automation, Beijing Institute of Technology, Beijing 100081, China, and the School of Automation, Nanjing University of Science and Technology, Nanjing 210018, China (e-mail: fumy@bit.edu.cn). The project page of OmniMap is available at https://omni-map.github.io/. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance-level understanding. Extensive experiments show OmniMap's superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the-art methods across diverse scenes. The framework's versatility is further evidenced through a variety of downstream applications, including multi-domain scene Q&A, interactive editing, perception-guided manipulation, and map-assisted navigation. The quality of a robot's 3D environmental representation, measured by its accuracy and dimensionality, fundamentally impacts the robot's task operational performance and execution capabilities.